2,034 research outputs found

    Multilingual Unsupervised Sentence Simplification

    Full text link
    Progress in Sentence Simplification has been hindered by the lack of supervised data, particularly in languages other than English. Previous work has aligned sentences from original and simplified corpora such as English Wikipedia and Simple English Wikipedia, but this limits corpus size, domain, and language. In this work, we propose using unsupervised mining techniques to automatically create training corpora for simplification in multiple languages from raw Common Crawl web data. When coupled with a controllable generation mechanism that can flexibly adjust attributes such as length and lexical complexity, these mined paraphrase corpora can be used to train simplification systems in any language. We further incorporate multilingual unsupervised pretraining methods to create even stronger models and show that by training on mined data rather than supervised corpora, we outperform the previous best results. We evaluate our approach on English, French, and Spanish simplification benchmarks and reach state-of-the-art performance with a totally unsupervised approach. We will release our models and code to mine the data in any language included in Common Crawl

    Teaching digital and global law for digital and global students: creating students as producers in a Hong Kong Internet Law class

    Get PDF
    In an increasingly globalised and digitalised society and economy, legal education needs to foster a different skill set among students from that taught traditionally. Law students need practice in responding to a variety of scenarios and contexts, as well as developing creative and critical thinking skills. The "student as producer" approach provides opportunities for students to build such skills by having students produce work that could benefit their fellow classmates and future cohorts, and contribute to the discipline's knowledge base. We present a case study of a final year undergraduate law course, Internet and the Law, at the Chinese University of Hong Kong where we used the student as producer approach, collaborated with external organisations and used digital tools to foster global and digital-savvy law students. Using a mixed-methods approach we highlight successes and limitations of using the "student as producer" approach, digital tools and an internationalised curriculum in our law classroom. Overall, students and staff found the approach successful in providing global and digital law students with practical skills. We also identified limitations and challenges to be addressed in future projects. Our findings speak to broader themes of active engagement, contributions, and practical knowledge for law students in their learning and future careers

    Learning to Speak and Act in a Fantasy Text Adventure Game

    Get PDF
    We introduce a large scale crowdsourced text adventure game as a research platform for studying grounded dialogue. In it, agents can perceive, emote, and act whilst conducting dialogue with other agents. Models and humans can both act as characters within the game. We describe the results of training state-of-the-art generative and retrieval models in this setting. We show that in addition to using past dialogue, these models are able to effectively use the state of the underlying world to condition their predictions. In particular, we show that grounding on the details of the local environment, including location descriptions, and the objects (and their affordances) and characters (and their previous actions) present within it allows better predictions of agent behavior and dialogue. We analyze the ingredients necessary for successful grounding in this setting, and how each of these factors relate to agents that can talk and act successfully
    corecore